Introduction
The project aims to analyze the state-wise COVID-19 data in the United States, focusing on trends, comparisons, and correlations within the data. We will use the ggplot2 package in R for exploring and visualizing the data, which will result into meaningful insights about the pandemic’s impact in the country and at the state level.
The data was obtained from the official website of The COVID tracking project: https://covidtracking.com/data/download
Objectives:
To explore and visualize state-wise trends across the country regarding COVID-19 cases, deaths, hospitalizations, and testing.
To compare the pandemic’s impact across different states of the country.
To explore the correlations between various indicators such as positive cases, testing rates, and hospitalizations.
To identify patterns and outliers in the data.
Data-set description
| Variable | Description |
|---|---|
date |
Date on which data was collected by The COVID Tracking Project. |
state |
Two-letter abbreviation for the state or territory. |
death |
Daily increase in hospitalizedCumulative, calculated from the previous day’s value. |
deathConfirmed |
Total fatalities with confirmed COVID-19 case diagnosis. |
deathIncrease |
Daily increase in death, calculated from the previous day’s value. |
deathProbable |
Total fatalities with probable COVID-19 case diagnosis |
hospitalized |
Total number of unique individuals who have ever been hospitalized with COVID-19. |
hospitalizedCumulative |
Total number of individuals who have ever been hospitalized with COVID-19. |
hospitalizedCurrently |
Individuals who are currently hospitalized with COVID-19. |
hospitalizedIncrease |
Daily increase in hospitalizedCumulative, calculated from the previous day’s value. |
inIcuCumulative |
Total number of individuals who have ever been hospitalized in the Intensive Care Unit with COVID-19 |
inIcuCurrently |
Individuals who are currently hospitalized in the Intensive Care Unit with COVID-19 |
negative |
Total number of unique people with a completed PCR test that returns negative. |
negativeIncrease |
Daily increase on negative variable |
negativeTestsAntibody |
The total number of completed antibody tests that return negative as reported by the state or territory. |
negativeTestsPeopleAntibody |
The total number of unique people with completed antibody tests that return negative as reported by the state or territory. |
negativeTestsViral |
Total number of completed PCR tests (or specimens tested) that return negative as reported by the state or territory |
onVentilatorCumulative |
Total number of individuals who have ever been hospitalized under advanced ventilation with COVID-19. |
onVentilatorCurrently |
Individuals who are currently hospitalized under advanced ventilation with COVID-19. |
positive |
Total number of confirmed plus probable cases of COVID-19 reported by the state or territory |
positiveCasesViral |
Total number of unique people with a positive PCR or other approved nucleic acid amplification test (NAAT), as reported by the state or territory. |
postiveIncrease |
The daily increase in field positive, which measures cases(confirmed + probable) calculated based on the previous day’s value. |
positiveScore |
|
positiveTestsAntibody |
Total number of completed antibody tests that return positive as reported by the state or territory. |
positiveTestsAntigen |
Total number of completed antigen tests that return positive as reported by the state or territory |
positiveTestsPeopleAntibody |
The total number of unique people with completed antibody tests that return positive as reported by the state or territory. |
positiveTestsPeopleAntigen |
Total number of unique people with a completed antigen test that returned positive as reported by the state or territory |
positiveTestsViral |
Total number of completed PCR tests (or specimens tested) that return positive as reported by the state or territory |
recovered |
Total number of people that are identified as recovered from COVID-19. Types of “recovered” cases include those who are discharged from hospitals, released from isolation, or those who have not been identified as fatalities after a number of days (30 or more) post disease onset. |
totalTestEncountersViral |
Total number of people tested per day via PCR testing as reported by the state or territory. |
totalTestEncountersViralIncrease |
Daily increase in totalTestEncountersViral |
totalTestResults |
In most states, the totalTestResults field is currently computed by adding positive and negative values because, historically, some states do not report totals, and to work around different reporting cadences for cases and tests. |
totalTestResultsIncrease |
Daily increase in totalTestResults, calculated from the previous day’s value. This calculation includes all the caveats associated with Total tests/totalTestResults, and it is recommended against using it at the state/territory level, so we will not be using it in this project. |
totalTestsAntibody |
Total number of completed antibody tests as reported by the state or territory |
totalTestAntigen |
Total number of completed antigen tests, as reported by the state or territory. |
totalTestsPeopleAntibody |
The total number of unique people who have been tested at least once via antibody testing as reported by the state or territory. |
totalTestsPeopleAntigen |
Total number of unique people who have been tested at least once via antigen testing, as reported by the state or territory |
totalTestsPeopleViral |
Total number of unique people tested at least once via PCR testing, as reported by the state or territory. |
totalTestsPeopleViralIncrease |
Daily increase in totalTestsPeopleViral |
totalTestsViral |
Total number of PCR tests (or specimens tested) as reported by the state or territory. |
totalTestsViralIncrease |
Daily increase in totalTestsViral |
Data-set definition sourced from the CoVid-tracking website: https://covidtracking.com/about-data/data-definitions
date state death deathConfirmed
Length:20780 Length:20780 Min. : 0.0 Min. : 0
Class :character Class :character 1st Qu.: 161.2 1st Qu.: 607
Mode :character Mode :character Median : 1108.0 Median : 2410
Mean : 3682.2 Mean : 3770
3rd Qu.: 4387.5 3rd Qu.: 5462
Max. :54124.0 Max. :21177
NA's :850 NA's :11358
deathIncrease deathProbable hospitalized hospitalizedCumulative
Min. :-201.00 Min. : 0.0 Min. : 1.0 Min. : 1.0
1st Qu.: 0.00 1st Qu.: 79.0 1st Qu.: 985.2 1st Qu.: 985.2
Median : 6.00 Median : 216.0 Median : 4472.0 Median : 4472.0
Mean : 24.79 Mean : 417.3 Mean : 9262.8 Mean : 9262.8
3rd Qu.: 24.00 3rd Qu.: 460.0 3rd Qu.:12248.5 3rd Qu.:12248.5
Max. :2559.00 Max. :2594.0 Max. :82237.0 Max. :82237.0
NA's :13187 NA's :8398 NA's :8398
hospitalizedCurrently hospitalizedIncrease inIcuCumulative inIcuCurrently
Min. : 0.0 Min. :-12257.00 Min. : 6 Min. : 0.0
1st Qu.: 166.5 1st Qu.: 0.00 1st Qu.: 501 1st Qu.: 60.0
Median : 531.0 Median : 0.00 Median :1295 Median : 172.0
Mean : 1190.6 Mean : 37.36 Mean :1934 Mean : 359.6
3rd Qu.: 1279.0 3rd Qu.: 36.00 3rd Qu.:2451 3rd Qu.: 380.0
Max. :22851.0 Max. : 16373.00 Max. :9263 Max. :5225.0
NA's :3441 NA's :16991 NA's :9144
negative negativeIncrease negativeTestsAntibody
Min. : 0 Min. :-968686.0 Min. : 587
1st Qu.: 53941 1st Qu.: 0.0 1st Qu.: 11242
Median : 305972 Median : 141.5 Median : 78888
Mean : 848225 Mean : 3589.1 Mean :145581
3rd Qu.: 1056611 3rd Qu.: 3916.0 3rd Qu.:162926
Max. :10186941 Max. : 212974.0 Max. :864153
NA's :7490 NA's :19322
negativeTestsPeopleAntibody negativeTestsViral onVentilatorCumulative
Min. : 1 Min. : 1 Min. : 32.0
1st Qu.: 54874 1st Qu.: 303300 1st Qu.: 220.2
Median :100282 Median : 936600 Median : 412.0
Mean :188711 Mean : 1818574 Mean : 574.7
3rd Qu.:261121 3rd Qu.: 2316865 3rd Qu.: 818.0
Max. :816231 Max. :16887410 Max. :1533.0
NA's :19808 NA's :15756 NA's :19490
onVentilatorCurrently positive positiveCasesViral positiveIncrease
Min. : 0.0 Min. : 0 Min. : 0 Min. :-7757
1st Qu.: 29.0 1st Qu.: 5754 1st Qu.: 10376 1st Qu.: 65
Median : 86.0 Median : 46064 Median : 68442 Median : 435
Mean : 151.6 Mean : 165156 Mean : 178662 Mean : 1384
3rd Qu.: 185.0 3rd Qu.: 177958 3rd Qu.: 202425 3rd Qu.: 1335
Max. :2425.0 Max. :3501394 Max. :3501394 Max. :71734
NA's :11654 NA's :188 NA's :6534
positiveScore positiveTestsAntibody positiveTestsAntigen
Min. :0 Min. : 0 Min. : 0
1st Qu.:0 1st Qu.: 852 1st Qu.: 1085
Median :0 Median : 8624 Median : 13661
Mean :0 Mean : 19811 Mean : 31837
3rd Qu.:0 3rd Qu.: 25900 3rd Qu.: 49010
Max. :0 Max. :190026 Max. :211546
NA's :17434 NA's :18547
positiveTestsPeopleAntibody positiveTestsPeopleAntigen positiveTestsViral
Min. : 0 Min. : 3 Min. : 0
1st Qu.: 3156 1st Qu.: 2682 1st Qu.: 16159
Median : 11956 Median :17763 Median : 65359
Mean : 20517 Mean :25259 Mean : 198500
3rd Qu.: 19059 3rd Qu.:47012 3rd Qu.: 224680
Max. :178979 Max. :81803 Max. :2628176
NA's :19686 NA's :20147 NA's :11822
recovered totalTestEncountersViral totalTestEncountersViralIncrease
Min. : 2 Min. : 0 Min. :-16946
1st Qu.: 3379 1st Qu.: 193794 1st Qu.: 0
Median : 17618 Median : 905322 Median : 0
Mean : 94242 Mean : 2702109 Mean : 5578
3rd Qu.: 93152 3rd Qu.: 2780542 3rd Qu.: 0
Max. :2502609 Max. :39695100 Max. :324671
NA's :8777 NA's :15549
totalTestResults totalTestResultsIncrease totalTestsAntibody
Min. : 0 Min. :-130545 Min. : 0
1st Qu.: 104050 1st Qu.: 1206 1st Qu.: 18965
Median : 655267 Median : 6125 Median : 84652
Mean : 2186936 Mean : 17508 Mean : 163403
3rd Qu.: 2264766 3rd Qu.: 19086 3rd Qu.: 230011
Max. :49646014 Max. : 473076 Max. :1054711
NA's :166 NA's :15991
totalTestsAntigen totalTestsPeopleAntibody totalTestsPeopleAntigen
Min. : 1 Min. : 1 Min. : 3
1st Qu.: 20047 1st Qu.: 54913 1st Qu.: 37676
Median : 123384 Median :103968 Median :144130
Mean : 308920 Mean :165432 Mean :168188
3rd Qu.: 432727 3rd Qu.:183103 3rd Qu.:255251
Max. :2664340 Max. :995580 Max. :580372
NA's :17359 NA's :18580 NA's :19781
totalTestsPeopleViral totalTestsPeopleViralIncrease totalTestsViral
Min. : 0 Min. :-1043744 Min. : 0
1st Qu.: 141470 1st Qu.: 0 1st Qu.: 132460
Median : 419372 Median : 0 Median : 731651
Mean : 965011 Mean : 2740 Mean : 2304555
3rd Qu.: 1229298 3rd Qu.: 2478 3rd Qu.: 2496925
Max. :11248247 Max. : 820817 Max. :49646014
NA's :11583 NA's :6264
totalTestsViralIncrease
Min. :-1154583
1st Qu.: 0
Median : 1896
Mean : 12961
3rd Qu.: 12441
Max. : 2164543
The summary of the data shows significant missing values for some variables, but due to the complexity of the attributes and the way the data was constructed we will not remove them nor impute them, as they will only make the analysis and visualizations inaccurate.
Additionally, it is necessary to categorize the variables properly before moving to the visualization and analysis
date state death deathConfirmed deathIncrease deathProbable
1 2021-03-07 AK 305 NA 0 NA
2 2021-03-07 AL 10148 7963 -1 2185
3 2021-03-07 AR 5319 4308 22 1011
4 2021-03-07 AS 0 NA 0 NA
5 2021-03-07 AZ 16328 14403 5 1925
6 2021-03-07 CA 54124 NA 258 NA
hospitalized hospitalizedCumulative hospitalizedCurrently
1 1293 1293 33
2 45976 45976 494
3 14926 14926 335
4 NA NA NA
5 57907 57907 963
6 NA NA 4291
hospitalizedIncrease inIcuCumulative inIcuCurrently negative negativeIncrease
1 0 NA NA NA 0
2 0 2676 NA 1931711 2087
3 11 NA 141 2480716 3267
4 0 NA NA 2140 0
5 44 NA 273 3073010 13678
6 0 NA 1159 NA 0
negativeTestsAntibody negativeTestsPeopleAntibody negativeTestsViral
1 NA NA 1660758
2 NA NA NA
3 NA NA 2480716
4 NA NA NA
5 NA NA NA
6 NA NA NA
onVentilatorCumulative onVentilatorCurrently positive positiveCasesViral
1 NA 2 56886 NA
2 1515 NA 499819 392077
3 1533 65 324818 255726
4 NA NA 0 0
5 NA 143 826454 769935
6 NA NA 3501394 3501394
positiveIncrease positiveScore positiveTestsAntibody positiveTestsAntigen
1 0 0 NA NA
2 408 0 NA NA
3 165 0 NA NA
4 0 0 NA NA
5 1335 0 NA NA
6 3816 0 NA NA
positiveTestsPeopleAntibody positiveTestsPeopleAntigen positiveTestsViral
1 NA NA 68693
2 NA NA NA
3 NA 81803 NA
4 NA NA NA
5 NA NA NA
6 NA NA NA
recovered totalTestEncountersViral totalTestEncountersViralIncrease
1 NA NA 0
2 295690 NA 0
3 315517 NA 0
4 NA NA 0
5 NA NA 0
6 NA NA 0
totalTestResults totalTestResultsIncrease totalTestsAntibody
1 1731628 0 NA
2 2323788 2347 NA
3 2736442 3380 NA
4 2140 0 NA
5 7908105 45110 580569
6 49646014 133186 NA
totalTestsAntigen totalTestsPeopleAntibody totalTestsPeopleAntigen
1 NA NA NA
2 NA 119757 NA
3 NA NA 481311
4 NA NA NA
5 NA 444089 NA
6 NA NA NA
totalTestsPeopleViral totalTestsPeopleViralIncrease totalTestsViral
1 NA 0 1731628
2 2323788 2347 NA
3 NA 0 2736442
4 NA 0 2140
5 3842945 14856 7908105
6 NA 0 49646014
totalTestsViralIncrease
1 0
2 0
3 3380
4 0
5 45110
6 133186